Introduction: This analysis is based on the outputs of pairwise comparisons of differential gene expression generated by this template. It uses results from 3 pairwise comparisons of 3 sample groups vs. their corresponding control groups and compares how these 3 sample groups are different from each other in terms of their sample-control differences (delta-delta). An example of such analysis is the different responses of 3 cell types to the treatment of the same drug. This analysis is focused on the overlapping of differentially expression at both gene and gene set levels.

 

Go to project home

1 Description

1.1 Project

Transcriptome in immune cells of control-patient samples

1.2 Data

Rna-seq data was generated from of 2 types of immune cells of 3 controls and 3 patients. Raw data was processed to get gene-level read counts. Pairwise comparisons were performed between controls and patients in each immune cell.

1.3 Analysis

This is a demo.

1.4 Pairwise comparisons

This report compares the results of the following pairwise comparisons.

  • B_Cell: Control vs. SLE
  • T_Cell: Control vs. SLE

2 Gene-level comparison

Go to project home

2.1 Global delta-delta correlation

Both comparisons reported the log ratio of 2 group means for each gene. The global agreement of log ratios of all genes indicates how much the results of these 2 comparisons are similar to or different from each other. Full table of gene-level statistics side-by-side is here.

plot of chunk log_ratio
Figure 1. This plot shows the global correlation (correlation coefficient = 0.244) between the 2 pairwise comparisons: B_Cell and T_Cell. Genes having p values less than 0.01 from both comparisons are highlighted.

2.2 Differentially expressed genes (DEGs)

Both comparisons identified DEGs between 2 compared groups. Overlapped DEGs identified by both comparisons are worthy of a closer look.

Table 1. Number of DEGs:
B_Cell T_Cell
Higher in the 2nd group 984 693
Lower in the 2nd group 1853 671
plot of chunk deg_overlap

Figure 2. Overlapping of DEGs. All combinations of differential expression towards opposite directions are plotted and Fisher’s exact test is performed to evaluate the significance of overlapping or lack of overlapping. Click links below to view overlapping DEGs:

2.3 ANOVA

2-way ANOVA analysis is performed to identify genes responding to SLE differently in different Cell. The analysis reported 3 p values, corresponding to the effect of SLE, Cell, and their interaction. The analysis identified 1127 significant genes with interaction p values less than 0.01. The ANOVA results are summarized in a table here.

plot of chunk aov_top
Figure 3. Examples: the top 4 genes having the most significant interactive p value, among the genes with significant differential expression in at least one of the two pairwise comparisons.

3 Gene set-level comparison

Genes are often grouped into pre-defined gene sets according to their function, interaction, location, etc. Analysis then can be performed on genes in the same gene set as a unit instead of individual genes.

3.1 Gene set average

Average differential expression of genes in the same gene set. The gene set-level statistics were fully summarized in this table here.

plot of chunk geneset_average_plot
Figure 4. Each dot represents a gene set and the average log-ratio of all genes in this gene set. The averages were calculated with the log-ratio value of genes (left panel) and the absolute of the log-ratios (right panel). The correlation coefficients are 0.4668 and 0.4779 respectively.

3.2 Gene set over-representation analysis (ORA)

Each 2-group comparison performs gene set over-representation analysis (ORA) that identifies gene sets over-represented with differentially expressed genes. The results of ORA of both 2-group comparisons are summarized and compared here. The ORA of each gene set reports an odds ratio and p value. These statistics from both comparisons were combined and listed side-by-side, as well as the difference of their odds ratios and ratio of their p values (p set to 0.5 when not available), in this table here

Table 2. Gene sets were broken down into subgroups by their sources. Click on the numbers of over-represented gene sets to see a full list.
B_Cell::Higher_in_Control B_Cell::Higher_in_SLE T_Cell::Higher_in_Control T_Cell::Higher_in_SLE
BioSystems 438 3212 438 3212
KEGG 40 319 40 319
MSigDb 857 4125 857 4125
OMIM 0 1 0 1
PubTator 123 7634 123 7634
plot of chunk ora_overlap

Figure 5. The overlapping of over-represented gene sets from both comparisons. Click links below to view tables of overlapping significant gene sets:

3.3 Gene set enrichment analysis (GSEA)

Each 2-group comparison performs gene set enrichment analysis (GSEA) on genes ranked by their differential expression. The results of GSEA of both 2-group comparisons are summarized and compared here. The GSEA of each gene set reports an enrichment score and p value. These statistics from both comparisons were combined and listed side-by-side in this table here

Table 3. Gene sets were broken down into subgroups by collections. Click on the numbers of enriched gene sets to see a full list.
B_Cell::Higher_in_Control B_Cell::Higher_in_SLE T_Cell::Higher_in_Control T_Cell::Higher_in_SLE
C0_Hallmark 2 37 2 37
C1_Positional 13 26 13 26
C2_BioCarta_Pathways 1 68 1 68
C2_Chemical_and_genetic_perturbations 36 1356 36 1356
C3_MicroRNA_targets 0 51 0 51
C3_TF_targets 4 284 4 284
C4_Cancer_gene_neighborhoods 42 86 42 86
C4_Cancer_modules 10 176 10 176
C6_Oncogenic_signatures 2 116 2 116
C7_Immunologic_signatures 58 922 58 922
GO_BP 145 2065 145 2065
GO_CC 67 159 67 159
GO_MF 44 359 44 359
KEGG_compound 4 126 4 126
KEGG_enzyme 1 1 1 1
KEGG_module 11 13 11 13
KEGG_pathway 9 161 9 161
KEGG_reaction 2 35 2 35
OMIM_gene 1 2 1 2
REACTOME 92 283 92 283
WikiPathways 2 91 2 91
plot of chunk nes
Figure 6. Nominal enrichment scores from both comparisons. Each dot represents a gene set. Gene sets with p values less than 0.01 from both comparisons are highlighted.
plot of chunk gsea_overlap

Figure 7. The overlapping of over-represented gene sets from both comparisons. Click links to view tables of overlapping significant gene sets from GSEA:

3.4 Gene clustering

The top 1000 genes with significant ANOVA p values (p <= ‘r prms\(geneset\)cluster$panova’) were used as seeds to perform a gene-gene clustering analysis and 5 clusters were identified. ORA was performed on the clusters to identify their functional association (see table below);

Table 4. This table lists the number of genes in each cluster (click the numbers to see gene lists), the average expression of all genes in a cluster of all sample groups, and then the gene sets over-represented in each cluster (click the numbers to see gene set lists). The gene expression levels were normalized so the mean of the control groups equals to 0 and the mean of the treatment groups is the number of standard deviations.
ID Size B_Cell::Control B_Cell::SLE T_Cell::Control T_Cell::SLE Gene_set
Cluster_1 895 0 -1.6039 0 -1.5890 4328
Cluster_2 781 0 1.3818 0 -1.3717 830
Cluster_3 1000 0 1.6673 0 1.5826 2324
Cluster_4 81 0 0.2312 0 1.5631 1290
Cluster_5 396 0 -1.4606 0 1.3153 3410
plot of chunk clustering_heatmap
Figure 8. This plot shows below the average expression levels of each cluster. Data was normalized before the analysis, so the mean of the control groups was zero and the standard deviation of all samples of each gene was 1.0. Values indicate number of standard deviation from mean of relative control group.
plot of chunk clustering_mean
Figure 9. This plot summarizes the group means and standard errors of all clusters.

4 Appendix

Check out the RoCA home page for more information.

4.1 Reproduce this report

To reproduce this report:

  1. Find the data analysis template you want to use and an example of its pairing YAML file here and download the YAML example to your working directory

  2. To generate a new report using your own input data and parameter, edit the following items in the YAML file:

    • output : where you want to put the output files
    • home : the URL if you have a home page for your project
    • analyst : your name
    • description : background information about your project, analysis, etc.
    • input : where are your input data, read instruction for preparing them
    • parameter : parameters for this analysis; read instruction about how to prepare input data
  3. Run the code below within R Console or RStudio, preferablly with a new R session:

if (!require(devtools)) { install.packages('devtools'); require(devtools); }
if (!require(RCurl)) { install.packages('RCurl'); require(RCurl); }
if (!require(RoCA)) { install_github('zhezhangsh/RoCAR'); require(RoCA); }

CreateReport(filename.yaml);  # filename.yaml is the YAML file you just downloaded and edited for your analysis

If there is no complaint, go to the output folder and open the index.html file to view report.

4.2 Session information

## R version 3.2.2 (2015-08-14)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.5 (Yosemite)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] DEGandMore_0.0.0.9000 snow_0.4-1            rchive_0.0.0.9000    
##  [4] gplots_3.0.1          MASS_7.3-45           htmlwidgets_0.6      
##  [7] DT_0.1                awsomics_0.0.0.9000   yaml_2.1.13          
## [10] rmarkdown_0.9.6       knitr_1.13            RoCA_0.0.0.9000      
## [13] RCurl_1.95-4.8        bitops_1.0-6          devtools_1.12.0      
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.5        magrittr_1.5       highr_0.6         
##  [4] stringr_1.0.0      caTools_1.17.1     tools_3.2.2       
##  [7] parallel_3.2.2     KernSmooth_2.23-15 withr_1.0.2       
## [10] htmltools_0.3.5    gtools_3.5.0       digest_0.6.9      
## [13] formatR_1.4        memoise_1.0.0      evaluate_0.9      
## [16] gdata_2.17.0       stringi_1.1.1      jsonlite_0.9.22

END OF DOCUMENT